Rhythm based music segmentation and octave scale cepstral features for sung language recognition

نویسندگان

  • Namunu Chinthaka Maddage
  • Haizhou Li
چکیده

Sung language recognition relies on both effective feature extraction and acoustic modeling. In this paper, we study rhythm based music segmentation in which the frame size varies in proportion to inter-beat interval of the music, in contrast to fixed length segmentation (FIX) in spoken language recognition. We show that acoustic feature extracted from the BSS scheme outperforms that from FIX. We also compare the effectiveness of musically motivated acoustic features, Octave scale cepstral coefficients (OSCCs) with Log frequency cepstral coefficients. We adopt Gaussian mixture model for sung language classifier design. Experiments are conducted on a database of 400 popular songs sung in four languages, including English, Chinese, German and Indonesian, which show that OSCC feature outperforms other features. We achieve 64.9% of sung language identification accuracy with Gaussian mixture models trained on shifteddelta-cepstral OSCC acoustic features extracted via BSS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Mel Frequency Cepstral Coefficients for Music Modeling

We examine in some detail Mel Frequency Cepstral Coefficients (MFCCs) the dominant features used for speech recognition and investigate their applicability to modeling music. In particular, we examine two of the main assumptions of the process of forming MFCCs: the use of the Mel frequency scale to model the spectra; and the use of the Discrete Cosine Transform (DCT) to decorrelate the Mel-spec...

متن کامل

Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony

The segmentation of music into intro-chorus-verse-outro, and similar segments, is a difficult topic. A method for performing automatic segmentation based on features related to rhythm, timbre, and harmony is presented, and compared, between the features and between the features and manual segmentation of a database of 48 songs. Standard information retrieval performance measures are used in the...

متن کامل

WP 2.1.11. Blind Temporal Segmentation

An optimistic functional definition for the Music Browser would require segmentation of complex mixtures a song into: • segments corresponding to different rhythm patterns • segments corresponding to different timbral or textural patters • segments corresponding to different melodic/harmonic patterns • segments corresponding to different “structural” patterns (e.g. introduction, theme A, theme ...

متن کامل

Classification of Iranian Traditional Music Dastgahs Using Features Based on Pitch Frequency

The Iranian traditional music is composed of seven majors Dastgahs: Chahargah, Homayoun, Mahour, Segah, Shour, Nava, and Rast-Panjgah. In this paper, a new algorithm for the classification of the Iranian traditional music Dastgahs based on pitch frequency is proposed. In this algorithm, the features of Lagrange coefficients of pitch logarithm (LCPL), Fuzzy similarity sets type 2 (FSST2), and th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008